Bridging Domains with Words: Opinion Analysis with Matrix Tri-factorizations

نویسندگان

  • Tao Li
  • Vikas Sindhwani
  • Chris H. Q. Ding
  • Yi Zhang
چکیده

With the explosion of user-generated web2.0 content in the form of blogs, wikis and discussion forums, the Internet has rapidly become a massive dynamic repository of public opinion on an unbounded range of topics. A key enabler of opinion extraction and summarization is sentiment classification: the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a topic of interest. Building high-quality sentiment classifiers using standard text categorization methods is challenging due to the lack of labeled data in a target domain. In this paper, we consider the problem of cross-domain sentiment analysis: can one, for instance, download rated movie reviews from rottentomatoes.com or IMBD discussion forums, learn linguistic expressions and sentiment-laden terms that generally characterize opinionated commentary and then successfully transfer this knowledge to the target domain, thereby building high-quality sentiment models without manual effort? We outline a novel sentiment transfer mechanism based on constrained non-negative matrix tri-factorizations of termdocument matrices in the source and target domains. The constrained matrix factorization framework naturally incorporates document labels via a least squares penalty incurred by a certain linear model and enables direct and explicit knowledge transfer across different domains. We obtain promising empirical results with this approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Transfer Learning on Manifold

Collective matrix factorization has achieved a remarkable success in document classification in the literature of transfer learning. However, the learned latent factors still suffer from the divergence between different domains and thus are usually not discriminative for an appropriate assignment of category labels. Based on these observations, we impose a discriminative regression model over t...

متن کامل

Exploiting Associations between Word Clusters and Document Classes for Cross-Domain Text Categorization

Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source-domain to an unlabeled target-domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw word features, the associations between word clusters (conceptual features) and document classes may remain stab...

متن کامل

Riordan group approaches in matrix factorizations

In this paper, we consider an arbitrary binary polynomial sequence {A_n} and then give a lower triangular matrix representation of this sequence. As main result, we obtain a factorization of the innite generalized Pascal matrix in terms of this new matrix, using a Riordan group approach. Further some interesting results and applications are derived.

متن کامل

Tensor Factorization towards Precision Medicine

Precision medicine initiatives come amid the rapid growth in quantity and variety of biomedical data, which exceeds the capacity of matrix oriented data representations and many current analysis algorithms. Tensor factorizations extend the matrix view to multiple modalities and support dimensionality reduction methods that identify latent groups of data for meaningful summarization of both feat...

متن کامل

A Word Vector and Matrix Factorization Based Method for Opinion Lexicon Extraction

Automatic opinion lexicon extraction has attracted lots of attention and many methods have thus been proposed. However, most existing methods depend on dictionaries (e.g., WordNet), which confines their applicability. For instance, the dictionary based methods are unable to find domain dependent opinion words, because the entries in a dictionary are usually domain-independent. There also exist ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010